Interactive optimization of embedding-based text similarity calculations

نویسندگان

چکیده

Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed this. However, calculating similarity ambiguous context-dependent task, so many open challenges still exist. In this paper, we present novel method calculations based on the combination embedding technology ensemble methods. By using embeddings, instead only one, show that it possible to achieve higher quality, which in turn key factor developing high-performing exploitation. We also provide prototype visual analytics tool helps analyst find optimal performing ensembles gain insights inner workings calculations. Furthermore, discuss generalizability our ideas fields beyond scope analysis.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Link Prediction using Network Embedding based on Global Similarity

Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...

متن کامل

Scalable Ordinal Embedding to Model Text Similarity

Practitioners of Machine Learning and related fields commonly seek out embeddings of object collections into some Euclidean space. These embeddings are useful for dimensionality reduction, for data visualization, as concrete representations of abstract notions of similarity for similarity search, or as features for some downstream learning task such as web search or sentiment analysis. A wide a...

متن کامل

Simbed: Similarity-Based Embedding

Simbed, standing for similarity-based embedding, is a new method of embedding high-dimensional data. It relies on the preservation of pairwise similarities rather than distances. In this respect, Simbed can be related to other techniques such as stochastic neighbor embedding and its variants. A connection with curvilinear component analysis is also pointed out. Simbed differs from these methods...

متن کامل

Interactive Textbooks; Embedding Image Processing Operator Demonstrations in Text

Traditional image processing teaching has used materials where the theory and drill are separated into textbooks and image processing packages. HTML and JAVA might allow easier construction of an integrated teaching resource. Such a resource would have widespread, platform-independent accessibility. This paper reports our assessment of this potential, which is explored through extensions of the...

متن کامل

Features Based Text Similarity Detection

As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Visualization

سال: 2022

ISSN: ['1473-8716', '1473-8724']

DOI: https://doi.org/10.1177/14738716221114372